This notebook covers how to get data via eland (https://github.com/elastic/eland) from Elasticsearch and discusses some different approaches how to visualize small multiples of binned histograms with eland and Altair. It covers the basic requirements to successfully create and publish a custom visualization from Jupyter to Kibana.
import pandas as pd
import altair as alt
import eland as ed
import json
import numpy as np
import matplotlib.pyplot as plt
alt.data_transformers.disable_max_rows()
df = ed.DataFrame('localhost:9200', 'kibana_sample_data_flights')
df.head()
df.info()
The following code is great because it's short and concise. However, we cannot deploy these bitmap based charts to Kibana and make them dynamic. And as of Elastic Stack 7.9, these kind of charts (both small multiples and binned histograms) are a bit cumbersome to create with native Kibana tools.
df_number = df.select_dtypes(include=np.number)
df_number.hist(figsize=[6,10])
plt.show()
Let's try to create the same chart type using Altair, here's a most basic example with dummy data:
source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
alt.Chart(source, height=160, width=120).mark_bar().encode(
x='a',
y='b'
)
To move forward, we convert the eland based data frame to a native pandas one. Note how this one requires more memory because it's kept locally now.
df_altair = ed.eland_to_pandas(df_number)
df_altair.info()
df_altair.head()
The above cell shows the data frame structure with the same schema as it was stored in Elasticsearch. This type of format could be used, but it's more cumbersome to work with in Altair/Vega. For more details, see Altair's docs on "Long-form vs. Wide-form Data" https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data
The next cell converts the data frame to "long form" which makes it more easy to split data based on an entity/attribute.
df_melt = df_altair.melt(var_name='attribute', value_name='value')
df_melt.head()
Finally we're able to recreate the original chart with Altair/Vega:
data = df_melt
chart = alt.Chart(data).mark_bar().encode(
alt.X('value:Q', bin=True, title=''),
alt.Y('count()', title=''),
tooltip=[
alt.Tooltip('value:Q', bin=True, title='x'),
alt.Tooltip('count()', title='y')
]
).properties(
width=130,
height=130
)
alt.ConcatChart(
concat=[
chart.transform_filter(alt.datum.attribute == value).properties(title=value)
for value in sorted(data.attribute.unique())
],
columns=3
).resolve_axis(
x='independent',
y='independent'
).resolve_scale(
x='independent',
y='independent'
)
To achieve the same with the data still in Elasticsearch, we need to change the example using a remote URL.
Note this is meant as a middle step towards our final chart specification for demonstration purposes. This one doesn't work with security enabled for Elasticsearch, so take care. You'll need to add the following settings to your elasticsearch.yml, again take care, this isn't recommended for production configs at all:
xpack.security.enabled: false
http.cors.allow-origin: "/.*/"
http.cors.enabled: true
The other difference to the previous code is that instead of using panda's .melt() we're using Vega's .transform_fold() to transpose the data from wide to long form.
The important bit is that we're using Altair only to do data transformations and not raw Python or pandas code, otherwise we're not able to publish the chart specification to Kibana later on.
url = 'http://localhost:9200/kibana_sample_data_flights/_search?size=10000'
url_data = alt.Data(url=url, format=alt.DataFormat(property='hits.hits',type='json'))
fields = [
'AvgTicketPrice',
'DistanceKilometers',
'DistanceMiles',
'FlightDelayMin',
'FlightTimeMin',
'dayOfWeek'
]
rename_dict = dict((a, 'datum._source.'+a) for a in fields)
url_chart = alt.Chart(url_data).transform_calculate(**rename_dict).transform_fold(
fields,
as_=['attribute', 'value']
).mark_bar().encode(
alt.X('value:Q', bin=True, title=''),
alt.Y('count()', title=''),
tooltip=[
alt.Tooltip('value:Q', bin=True, title='x'),
alt.Tooltip('count()', title='y')
]
).properties(
width=150,
height=150
)
url_charts = alt.ConcatChart(
concat=[
url_chart.transform_filter(alt.datum.attribute == attribute).properties(title=attribute)
for attribute in sorted(fields)
],
columns=2
).resolve_axis(
x='independent',
y='independent'
).resolve_scale(
x='independent',
y='independent'
)
url_charts
Next we're picking up the Vega spec from the chart above, apply some options and save it as a Saved Object in Kibana.
def saveVegaVis(client, index, visName, altairChart):
chart_json = json.loads(altairChart.to_json())
chart_json['data']['url'] = {
"%context%": True,
"index": index,
"body": {
"size": 10000
}
}
visState = {
"type": "vega",
"aggs": [],
"params": {
"spec": json.dumps(chart_json, sort_keys=True, indent=4, separators=(',', ': ')),
},
"title": visName
}
visSavedObject={
"visualization" : {
"title" : visName,
"visState" : json.dumps(visState, sort_keys=True, indent=4, separators=(',', ': ')),
"uiStateJSON" : "{}",
"description" : "",
"version" : 1,
"kibanaSavedObjectMeta" : {
"searchSourceJSON" : json.dumps({
"query": {
"language": "kuery",
"query": ""
},
"filter": []
}),
}
},
"type" : "visualization",
"references" : [ ],
"migrationVersion" : {
"visualization" : "7.7.0"
},
"updated_at" : datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S.000Z")
}
return client.index(index='.kibana',id='visualization:'+visName,body=visSavedObject)
import datetime
# Import Elasticsearch package
from elasticsearch import Elasticsearch
# Connect to the elastic cluster
es=Elasticsearch([{'host':'localhost','port':9200}])
saveVegaVis(es, 'kibana_sample_data_flights', 'def-vega-1', url_charts)